Goto

Collaborating Authors

 ds data


Augmenting Document-level Relation Extraction with Efficient Multi-Supervision

Lin, Xiangyu, Jia, Weijia, Gong, Zhiguo

arXiv.org Artificial Intelligence

Despite its popularity in sentence-level relation extraction, distantly supervised data is rarely utilized by existing work in document-level relation extraction due to its noisy nature and low information density. Among its current applications, distantly supervised data is mostly used as a whole for pertaining, which is of low time efficiency. To fill in the gap of efficient and robust utilization of distantly supervised training data, we propose Efficient Multi-Supervision for document-level relation extraction, in which we first select a subset of informative documents from the massive dataset by combining distant supervision with expert supervision, then train the model with Multi-Supervision Ranking Loss that integrates the knowledge from multiple sources of supervision to alleviate the effects of noise. The experiments demonstrate the effectiveness of our method in improving the model performance with higher time efficiency than existing baselines.


Uncertainty Guided Label Denoising for Document-level Distant Relation Extraction

Sun, Qi, Huang, Kun, Yang, Xiaocui, Hong, Pengfei, Zhang, Kun, Poria, Soujanya

arXiv.org Artificial Intelligence

Document-level relation extraction (DocRE) aims to infer complex semantic relations among entities in a document. Distant supervision (DS) is able to generate massive auto-labeled data, which can improve DocRE performance. Recent works leverage pseudo labels generated by the pre-denoising model to reduce noise in DS data. However, unreliable pseudo labels bring new noise, e.g., adding false pseudo labels and losing correct DS labels. Therefore, how to select effective pseudo labels to denoise DS data is still a challenge in document-level distant relation extraction. To tackle this issue, we introduce uncertainty estimation technology to determine whether pseudo labels can be trusted. In this work, we propose a Document-level distant Relation Extraction framework with Uncertainty Guided label denoising, UGDRE. Specifically, we propose a novel instance-level uncertainty estimation method, which measures the reliability of the pseudo labels with overlapping relations. By further considering the long-tail problem, we design dynamic uncertainty thresholds for different types of relations to filter high-uncertainty pseudo labels. We conduct experiments on two public datasets. Our framework outperforms strong baselines by 1.91 F1 and 2.28 Ign F1 on the RE-DocRED dataset.


Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision

Wan, Zhen, Cheng, Fei, Liu, Qianying, Mao, Zhuoyuan, Song, Haiyue, Kurohashi, Sadao

arXiv.org Artificial Intelligence

Contrastive pre-training on distant supervision has shown remarkable effectiveness in improving supervised relation extraction tasks. However, the existing methods ignore the intrinsic noise of distant supervision during the pre-training stage. In this paper, we propose a weighted contrastive learning method by leveraging the supervised data to estimate the reliability of pre-training instances and explicitly reduce the effect of noise. Experimental results on three supervised datasets demonstrate the advantages of our proposed weighted contrastive learning approach compared to two state-of-the-art non-weighted baselines.Our code and models are available at: https://github.com/YukinoWan/WCL


Denoising Enhanced Distantly Supervised Ultrafine Entity Typing

Zhang, Yue, Fei, Hongliang, Li, Ping

arXiv.org Artificial Intelligence

Recently, the task of distantly supervised (DS) ultra-fine entity typing has received significant attention. However, DS data is noisy and often suffers from missing or wrong labeling issues resulting in low precision and low recall. This paper proposes a novel ultra-fine entity typing model with denoising capability. Specifically, we build a noise model to estimate the unknown labeling noise distribution over input contexts and noisy type labels. With the noise model, more trustworthy labels can be recovered by subtracting the estimated noise from the input. Furthermore, we propose an entity typing model, which adopts a bi-encoder architecture, is trained on the denoised data. Finally, the noise model and entity typing model are trained iteratively to enhance each other. We conduct extensive experiments on the Ultra-Fine entity typing dataset as well as OntoNotes dataset and demonstrate that our approach significantly outperforms other baseline methods.


Dual Supervision Framework for Relation Extraction with Distant Supervision and Human Annotation

Jung, Woohwan, Shim, Kyuseok

arXiv.org Artificial Intelligence

Relation extraction (RE) has been extensively studied due to its importance in real-world applications such as knowledge base construction and question answering. Most of the existing works train the models on either distantly supervised data or human-annotated data. To take advantage of the high accuracy of human annotation and the cheap cost of distant supervision, we propose the dual supervision framework which effectively utilizes both types of data. However, simply combining the two types of data to train a RE model may decrease the prediction accuracy since distant supervision has labeling bias. We employ two separate prediction networks HA-Net and DS-Net to predict the labels by human annotation and distant supervision, respectively, to prevent the degradation of accuracy by the incorrect labeling of distant supervision. Furthermore, we propose an additional loss term called disagreement penalty to enable HA-Net to learn from distantly supervised labels. In addition, we exploit additional networks to adaptively assess the labeling bias by considering contextual information. Our performance study on sentence-level and document-level REs confirms the effectiveness of the dual supervision framework.